point in time. This expression involves that individual’s predictor values and the regression
coefficients. Next, the software constructs a longer expression that includes the likelihood of
getting exactly the observed survival times for all the participants in the data set. And if this isn’t
already complicated enough, the expression has to deal with the issue of censored data. At this
point, the software seeks to find the values of the regression coefficients that maximize this very
long likelihood expression (similar to the way maximum likelihood is described with logistic
regression in Chapter 18).
Hazard ratios
Hazard ratios (HRs) are the estimates of relative risk obtained from PH regression. HRs in survival
regression play a similar role that odds ratios play in logistic regression. They’re also calculated the
same way from regression output — by exponentiating the regression coefficients:
In logistic regression:
In PH regression:
Keep in mind that hazard is the chance of dying in any small period of time. For each
predictor variable in a PH regression model, a coefficient is produced that — when
exponentiated — equals the HR. The HR tells you how much the hazard rate increases for the
participants positive for the predictor compared to the comparison group when you increase the
variable’s value by exactly 1.0 unit. Therefore, a HR’s numerical value depends on the units in
which the variable is expressed in your data. And for categorical predictors, interpreting the HR
depends on how you code the categories.
For example, if a survival regression model in a study of emphysema patients includes number of
cigarettes smoked per day as a predictor of survival, and if the HR for this variable comes out equal to
1.05, then a participant’s chances of dying at any instant increase by a factor of 1.05 (5 percent) for
every additional cigarette smoked per day. A 5 percent increase may not seem like much, but it’s
applied for every additional cigarette per day. A person who smokes one pack (20 cigarettes) per day
has that 1.05 multiplication applied 20 times, which is like multiplying by
, which equals 2.65.
One pack contains 20 cigarettes, so if you change the units in which you record smoking levels from
cigarettes per day to packs per day, you would use units that are 20 times larger. In that case, the
corresponding regression coefficient is 20 times larger, and the HR is raised to the 20th power (2.65
instead of 1.05 in this example).
And a two-pack-per-day smoker’s hazard increases by a factor of 2.65 over a one-pack-per-day
smoker. This translates to a
increase (approximately sevenfold) in the chances of dying at any
instant for the smoker compared to a nonsmoker.
Executing a Survival Regression
As with all statistical methods dealing with time-to-event data, your dependent variable is actually a
pair of variables: